Questions
How can I create publication-quality graphics in R?
Objectives
To be able to use
ggplot2to generate publication quality graphics.To understand the basic grammar of graphics, including the aesthetics and geometry layers, adding statistics, transforming scales, and colouring or panelling by groups.
Plotting the data is one of the best ways to quickly explore it and generate hypotheses about various relationships between variables.
There are several plotting systems in R, but today we will focus on ggplot2 which implements grammar of graphics - a coherent system for describing components that constitute visual representation of data. For more information regarding principles and thinking behind ggplot2 graphic system, please refer to Layered grammar of graphics by Hadley Wickham (@hadleywickham).
The advantage of ggplot2 is that it allows R users to create publication quality graphics with just a few lines of code. ggplot2 has a large user base and is constantly developed and extended by the community.
ggplot2 is a core member of tidyverse family of packages. Installing and loading the package under the same name will load all of the packages we will need for this workshop. Lets get started!
# install.packages("tidyverse")
# install.packages("penguins")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.1 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
If above code produces an error “there is no package called ‘tidyverse’”, uncomment (remove #) the line above and run install.packages()command before you load the library. You only need to install the package once, but you will have to reload it, using the library() command, every time you restart R.
Today we will be working with the penguins dataset, which is the excerpt from the penguins data. Data were collected and made available by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER, a member of the Long Term Ecological Research Network.
First let’s load the data.
penguins <- palmerpenguins::penguins
Here, we grab the data called penguins from the palmerpenguins package, and assign (<-) it to an object in our R environment we call penguins. Notice how you can see it in your environment pane in RStudio. You can have a look at the content of the penguins data frame by simply typing penguins either in the R-chunk or in the console. Data frame is a rectangular collection of data, where variables are organized as columns and observations are listed as rows.
penguins
## # A tibble: 344 x 8
## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
## <fct> <fct> <dbl> <dbl> <int> <int>
## 1 Adelie Torgersen 39.1 18.7 181 3750
## 2 Adelie Torgersen 39.5 17.4 186 3800
## 3 Adelie Torgersen 40.3 18 195 3250
## 4 Adelie Torgersen NA NA NA NA
## 5 Adelie Torgersen 36.7 19.3 193 3450
## 6 Adelie Torgersen 39.3 20.6 190 3650
## 7 Adelie Torgersen 38.9 17.8 181 3625
## 8 Adelie Torgersen 39.2 19.6 195 4675
## 9 Adelie Torgersen 34.1 18.1 193 3475
## 10 Adelie Torgersen 42 20.2 190 4250
## # … with 334 more rows, and 2 more variables: sex <fct>, year <int>
The dataset contains the following fields:
More information about the package and the data is available in help. Just type ?penguins in console, located in the bottom panel of your RStudio, or type penguins in the search field of the Help tab of the bottom-right RStudio panel. Whenever you are unsure about anything in R, it is a good idea to check out the help file using one of the two methods described above.
Here’s a question that we would like to answer using
penguinsdata: Do penguins with high body mass also have long beaks? This might seem like a silly question, but it gets us exploring our data.
To plot penguins, run the following code in the R-chunk or in console. The following code will put body_mass_g on the x-axis and bill_length_mm on the y-axis:
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
Note that we split the function into several lines. In R, any function has a name and is followed by parentheses. Inside the parentheses we place any information the function needs to run. Here, we are using two main functions, ggplot() and geom_point(). To save screen space, we have placed each function on its own line, and also split up arguments into several lines. How this is done depends on you, there are no real rules for this. We will use the tidyverse coding style throughout this course, to be consistent and also save space on the screen. The plus sign indicates that the ggplot is not over yet and that the next line should be interpreted as additional layer to the preceding ggplot() function. In other words, when writing a ggplot() function spanning several lines, the + sign goes at the end of the line, not in the beginning.
The plot shows positive linear relationship between bill length and body mass.
Does this graph confirm or disprove your initial hypothesis about the relationship between these variables?
Note that in order to create a plot using ggplot2 system, you should start your command with ggplot() function. It creates an empty coordinate system and initializes the dataset to be used in the graph (which is supplied as a first argument into the ggplot() function). In order to create graphical representation of the data, we can add one or more layers to our otherwise empty graph. Functions starting with the prefix geom_ create a visual representation of data. In this case we added scattered points, using geom_point() function. There are many geoms in ggplot2, some of which we will learn in this lesson.
geom_ functions create mapping of variables from the earlier defined dataset to certain aesthetic elements of the graph, such as axis, shapes or colours. The first argument of any geom_ function expects the user to specify these mappings, wrapped in the aes() (short for aesthetics) function. In this case, we mapped body_mass_g and bill_length_mm variables from penguins dataset to x and y-axis, respectively (using x and y arguments of aes() function).
Generally speaking, the template for visualizing data in ggplot2 can be summarized as follows:
`ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))`
In the remainder of this lesson we will learn how to extend and complete this template using different elements to produce various visualizations. First, we will look closer at the <MAPPINGS> component.
You can run assignments in your own RStudio, or run the first challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
1a: How does body mass change over time? What do you observe? Note that many points are plotted on top of each other. This is called “overplotting”.
Hint: the
penguinsdataset has a column calledyear, which should appear on the x-axis.1b: Try a different
geom_function calledgeom_jitter. It will spread the points apart a little bit using random noise.1c: See if you can visualize body mass by island. Which island tends to have higher body mass (notice the density of the points along the y-axis)? Lowest body mass? Which island has highest spread in body mass values? How about lowest spread?
## 1a
ggplot(data = penguins) +
geom_point(
mapping = aes(x = year,
y = bill_length_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
# 1b
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = year,
y = bill_length_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
## 1c
ggplot(data = penguins) +
geom_point(
mapping = aes(x = island,
y = bill_length_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
What if we want to combine graphs from the previous two challenges and show the relationship between three variables in the same graph? Turns out, we don’t necessarily need to use third geometrical dimension, we can simply employ colour.
The following graph maps island variable from penguins dataset to the colour aesthetic of the plot. Let’s take a look:
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = year,
y = bill_length_mm,
colour = island)
)
## Warning: Removed 2 rows containing missing values (geom_point).
You can run assignments in your own RStudio, or run the second challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
2a: What will happen if you switch the mappings of
islandandyearin the previous example? Is the graph still useful? Why? Try mapping year to colour.2b: What if you map
colouraesthetic tospecies? What has changed? How isyeardifferent fromspecies? What is the limitation of thecolouraesthetic, when used to visualize different types of data?2c: Can you add a little colour to our initial graph of body mass by bill length? colour the points by island.
2d: How about using colour gradient to illustrate change over time?
## 2a
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = bill_length_mm,
y = year,
colour = year)
)
## Warning: Removed 2 rows containing missing values (geom_point).
# 2b
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = island,
y = bill_length_mm,
colour = species)
)
## Warning: Removed 2 rows containing missing values (geom_point).
## 2c
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = island)
)
## Warning: Removed 2 rows containing missing values (geom_point).
# 2d
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = year)
)
## Warning: Removed 2 rows containing missing values (geom_point).
There are other aesthetics that can come handy. One of them is size. The idea is that we can vary the size of data points to illustrate another continuous variable, such as species bill depth. Lets look at four dimensions at once!
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = island,
size = bill_depth_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
There’s one more useful aesthetic property of the graph which is good for visualizing low-cardinality categorical variables (categorical variables with small number of unique values), called shape. The idea is that you can employ different shapes (other than circles) to plot the data.
You can run assignments in your own RStudio, or run the third challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
3: Blow your mind by visualizing five(!) dimensions in the same graph. Modify the previous example mapping year to colour and shape to island.
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = year,
shape = island,
size = bill_depth_mm)
)
## Warning: Removed 2 rows containing missing values (geom_point).
Combining too many aesthetics in the same graph can make it quite busy. However, you can always remove certain aesthetic properties and use several graphs to highlight different aspects of data.
Until now, we explored different aesthetic properties of a graph mapped to certain variables. What if you want to recolour or use a certain shape to plot all data points? Well, that means that such colour or shape will no longer be mapped to any data, so you need to supply it to geom_ function as a separate argument (outside of the mapping). This is called “setting” in the ggplot2-world. We “map” aesthetics to data columns, or we “set” single values outside aesthetics to apply to the entire geom or plot. Here’s our initial graph with all colours coloured in blue.
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm),
colour = "blue"
)
## Warning: Removed 2 rows containing missing values (geom_point).
Once more, observe that the colour is now not mapped to any particular variable from the penguins dataset and applies equally to all data points, therefore it is outside the mapping argument and is not wrapped into aes() function. Note that set colours are supplied as characters (in quotes).
You can run assignments in your own RStudio, or run the forth challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
4a: Try mapping
colouraesthetic toislandand then toyear. What do you notice? What might be the reason for different treatment of these variables byggplot?4b: Change the transparency of the data points by year.
4c: Move the transparency outside the
aes()and set it to0.7. What can be the benefit of each one of these methods?4d: Add colour argument, with ‘blue’ in quotations, into the aes and see what happens. Did you expect that?
## 4a
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = year)
)
## Warning: Removed 2 rows containing missing values (geom_point).
## 4b
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
alpha = year)
)
## Warning: Removed 2 rows containing missing values (geom_point).
## 4c
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm),
alpha = 0.7)
## Warning: Removed 2 rows containing missing values (geom_point).
## 4d
ggplot(data = penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = "blue")
)
## Warning: Removed 2 rows containing missing values (geom_point).
When an argument is placed inside an aes and remains quoted, like “red” here, ggplot is interpreting as a variable named “blue” and not the colour blue!
Lastly we will learn how to label and annotate the chart using labs and annotate functions.
ggplot(data = penguins) +
geom_point(
aes(x = bill_depth_mm,
y = bill_length_mm,
colour = island)
) +
facet_wrap(~year) +
labs(
title = "Bill depth vs bill length over time",
subtitle = "In the past 3 years, relationship between bill depth and length in penguins do not seem to change",
caption = "Originally published in doi:10.1371/journal.pone.0090081",
x = "Bill length, (mm)",
y = "Bill depth, (mm)",
colour = "Island"
)
## Warning: Removed 2 rows containing missing values (geom_point).
Next, we will consider different options for ggplot2 graph example. Using different geom_ functions user can highlight different aspects of data. For example, we could connect individual data points belonging to the same species into a line and illustrate the development of body mass over time for each species separately using geom_line() function.
Some geom_ functions allow additional aesthetics, such as aesthetic group in the geom_line() function. This aesthetic may not have any meaning in other geoms, but here it allows us to draw multiple lines, one per species. To keep the lines organized, we will colour them by island.
ggplot(data = penguins) +
geom_line(
mapping = aes(x = year,
y = bill_length_mm,
group = species,
colour = species)
)
That looks crazy! Line plots are not ideal for this data, as there are multiple lines per species and year, so the lines do not line up nicely.
Another useful geom function is geom_boxplot(). It adds a layer with the “box and whiskers” plot illustrating the distribution of values within categories. The following chart breaks down body mass by island, where the box represents first and third quartile (the 25th and 75th percentiles), the middle bar signifies the median value and the whiskers extent to cover 95% confidence interval. Outliers (outside of the 95% confidence interval range) are shown separately.
ggplot(data = penguins) +
geom_boxplot(
mapping = aes(x = island,
y = bill_length_mm)
)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
Layers can be added on top of each other. In the following graph we will place the boxplots over jittered points to see the distribution of outliers more clearly. We can map two aesthetic properties to the same variable. Here we will also use different colour for each island.
ggplot(data = penguins) +
geom_jitter(
mapping = aes(x = island,
y = bill_length_mm,
colour = island)
) +
geom_boxplot(
mapping = aes(x = island,
y = bill_length_mm,
colour = island)
)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2 rows containing missing values (geom_point).
Now, this was slightly inefficient due to duplication of code - we had to specify the same mappings for two layers. To avoid it, you can move common arguments of geom_ functions to the main ggplot() function. In this case every layer will “inherit” the same arguments, specified in the “parent” function.
ggplot(data = penguins,
mapping = aes(x = island,
y = bill_length_mm,
colour = island)
) +
geom_jitter() +
geom_boxplot()
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## Warning: Removed 2 rows containing missing values (geom_point).
You can still add layer-specific mappings or other arguments by specifying them within individual geoms. We would recommend building each layer separately and then moving common arguments up to the “parent” function.
We can use linear models to highlight differences in dependency between bill length and body mass by island. Notice that we added a separate argument to the geom_smooth() function to specify the type of model we want ggplot2 to built using the data (linear model). The geom_smooth() function has also helpfully provided confidence intervals, indicating “goodness of fit” for each model (shaded gray area). For more information on statistical models, please refer to help (by typing ?geom_smooth)
ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm)
) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Notice, that we also used a new visual property called alpha to increase transparency of the data points and make trend lines stand out. alpha property can also be used as a mapping aesthetic, i.e. transparency can be made to vary depending on the value of certain variable.
You can run assignments in your own RStudio, or run the fifth challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
5a. Modify the graph to force R to create single regression line for all data points. Keep the points coloured by island.
5b. Add a regression line to the plot that plots one line for each species, while also plotting one across all species
In the graph above, each geom inherited all three mappings: x, y and colour. If we want only single linear model to be built, we would need to limit the effect of colour aesthetic to only geom_point() function, by moving it from the “parent” function to the layer where we want it to apply. Note, though, that because we want the colour to be still mapped to the island variable, it needs to be wrapped into aes() function and supplied to mapping argument.
# 5a
ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point(mapping = aes(colour = species),
alpha = 0.5) +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
# Alternative solution
ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm,
colour = species)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm",
colour = "black")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
# 5b
ggplot(data = penguins,
mapping = aes(x = bill_depth_mm,
y = bill_length_mm)) +
geom_point(mapping = aes(colour = species),
alpha = 0.5) +
geom_smooth(method = "lm", aes(colour = species)) +
geom_smooth(method = "lm", colour = "black")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Some times, we have data that need to be plotted on another scale than the default native scale of the data. For instance, if you have data that have some type of exponential or hyperbolic distribution. There are a couple of ways to do this in ggplot2. The first thing we can try is transforming the data with a function, like the log() for a logarithmic transformation.
ggplot(data = penguins,
mapping = aes(x = log(body_mass_g),
y = bill_length_mm,
colour = island)
) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
As you can observe the x-axis label of our graph says body_mass_g, which indicates that we are not really plotting the original data, but rather the output of log() function. The same effect (with slightly more aesthetically pleasing x-axis label) can be achieved by specifying the x-axis scale transformation as a separate layer. Instead of transforming the values, we will transform the scale of x-axis.
ggplot(data = penguins,
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = island)
) +
geom_point() +
geom_smooth(method = "lm") +
scale_x_log10()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Now the x-axis is measured in log10 units and the data, plotted on log10 scale looks more linear. Certain scale and coordinate functions may result in similar visual effects on the chart, but the way they interact with other aesthetic elements may be quite different. Check out the online ggplot2 documentation for more details and examples of using scale and coordinate transformations.
You can run assignments in your own RStudio, or run the sixth challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
6a: Make a boxplot of body mass by year. When was interquartile range of body mass the smallest?
6b: Make a histogram of
body_mass_g. What is the shape of the distribution? Try setting bin to 50. Why is the bin parameter important for interpretation of the histogram?6c: Build a density function. How would you compare density functions of different islands?
6d: Based on graph produced using
geom_density2d()function of log bill length vs body mass, how many clusters of data points can you identify? What if you look at it by island?
## 6a
ggplot(penguins) +
geom_boxplot(
mapping = aes(y = body_mass_g,
x = year,
group = year)
)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## 6b
ggplot(penguins) +
geom_histogram(
mapping = aes(x = body_mass_g),
bins = 50
)
## Warning: Removed 2 rows containing non-finite values (stat_bin).
## 6c
ggplot(penguins) +
geom_density(
mapping = aes(x = body_mass_g,
colour = island)
)
## Warning: Removed 2 rows containing non-finite values (stat_density).
## 6d
ggplot(penguins) +
geom_density2d(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = island)
)
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
Multi-layered graphs employing several aesthetics can look crowded. In order to avoid it, one can split the data into different graphs using panels of similar graphs. In ggplot2 this method is called “faceting”. Lets facet the graph above by island and show the data points and the trend for each island in a separate chart.
ggplot(data = penguins,
mapping = aes(x = body_mass_g,
y = bill_length_mm)
) +
geom_point() +
geom_smooth() +
facet_wrap(~ island)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
The facet_wrap() layer takes a “formula” as its argument, denoted by the tilde (~). This tells R to draw a panel for each unique value in the island column of the penguins dataset. Faceting is useful when number of panels is limited. Notice that here R places panels from left to right, “wrapping” those panels that do not fit in one row onto the new line. Learn about advanced faceting, including faceting over several variables using help on ?facet_grid().
Reiterating our previously proposed ggplot2 template and adding what we learned until, now we can state:
`ggplot(data = <DATA>) +
<GEOM_FUNCTION>(mapping = aes(<MAPPINGS>)) +
<FACET_FUNCTION>`
You can run assignments in your own RStudio, or run the seventh challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
7a: Try faceting by year, keeping the linear smoother. Is there any change in slope of the linear trend over the years?
7b: What if you look at linear models per island?
## 7a
ggplot(data = penguins,
mapping = aes(x = body_mass_g,
y = bill_length_mm)
) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap(~ year)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
## 7b
ggplot(data = penguins,
mapping = aes(x = body_mass_g,
y = bill_length_mm)
) +
geom_point() +
geom_smooth(method = "lm") +
facet_wrap( ~ island)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 2 rows containing non-finite values (stat_smooth).
## Warning: Removed 2 rows containing missing values (geom_point).
Often, people would want to change the colours in the plots or the general theme to something more suitable to their needs. Journals often have specific requirements to how plots should look, or you need to use specific colours across different types of visualisations.
Changing the colours are done through the scale_fill and scale_colour functions.
ggplot(penguins)+
geom_density(
aes(x = body_mass_g,
colour = island)
) +
scale_colour_brewer(palette = "Dark2")
## Warning: Removed 2 rows containing non-finite values (stat_density).
This plot has used colour on a density, which will colour the edges of the density plot. This is not very clear, however, and colours are not easy to see. We can try switching the colour to “fill” mapping instead, and also change the scale to fill, which should flood the density area.
Can you notice something other that is different with this code? We have taken away the data = and mapping = calls! ggplot is clever. In the ggplot() call, as long as we provide the data to use as the first input, it know this is data. And all the geoms auto-detect the aes(), so you dont need to put mapping = there!
ggplot(penguins)+
geom_density(
aes(x = body_mass_g,
fill = island),
alpha = .5
) +
scale_fill_brewer(palette = "Dark2")
## Warning: Removed 2 rows containing non-finite values (stat_density).
Here, we are using the brewer palettes to colour. These have the very convenient
scale_[]_brewer() functions we can use. We can also use our own palettes if we want!
ggplot(penguins)+
geom_density(
aes(x = body_mass_g,
fill = island),
alpha = .5
) +
scale_fill_manual(values = c("firebrick", "dodgerblue", "forestgreen"))
## Warning: Removed 2 rows containing non-finite values (stat_density).
Its often smart to use palettes that are curated though, as they often include colour scales that give good distinctions between colours. Another popular variant like brewer is the viridis colours.
ggplot(penguins)+
geom_density(
aes(x = body_mass_g,
fill = island),
alpha = .5
) +
scale_fill_viridis_d()
## Warning: Removed 2 rows containing non-finite values (stat_density).
You can run assignments in your own RStudio, or run the eigth challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
8a: Make a boxplot of body mass by year. What happens if you add
factor()around year? What do you need to change in the scale_fill function to make it work?8b: Make a histogram of
body_mass_g? What is the shape of the distribution? Why is bin parameter important for interpretation of the histogram?8c: Build a density plot How would you compare density functions of different islands?
## 8a
ggplot(penguins) +
geom_boxplot(
mapping = aes(y = body_mass_g,
x = factor(year),
fill = factor(year))
) +
scale_fill_viridis_d()
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## 8b
ggplot(penguins) +
geom_point(
mapping = aes(x = body_mass_g,
y = bill_length_mm,
colour = body_mass_g)
) +
scale_colour_viridis_c()
## Warning: Removed 2 rows containing missing values (geom_point).
## 8c
ggplot(penguins)+
geom_density2d(
aes(x = body_mass_g,
y = bill_length_mm,
colour = island)
) +
scale_colour_brewer(palette = "Dark2")
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
Now that we can change the colours, we might want to change the general plot look. Not everyone likes the grey grid background, the grid ticks etc.
In ggplot2, we change this static part of the plot through the theme functions. There are several built in versions that have common layouts people might use. For instance, theme_classic() is often used when preparing for journal submissions for journals that have very specific requirements for plot composition.
ggplot(penguins) +
geom_boxplot(
mapping = aes(y = body_mass_g,
x = factor(year),
fill = factor(year))
) +
scale_fill_viridis_d() +
theme_classic()
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
In addition to the built in themes, you can control everything in the plot layout through the simple theme call. While the function name is simple, altering theme elements can be quite difficult, as its quite advanced. But there are some specific arguments people often like to use, like repositioning where the legend appears.
ggplot(penguins) +
geom_boxplot(
mapping = aes(y = body_mass_g,
x = factor(year),
fill = factor(year))
) +
scale_fill_viridis_d() +
theme(
legend.position = "bottom"
)
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
You can run assignments in your own RStudio, or run the ninth challenge in the plotting tutorial by entering the following in the R console:
learnr::run_tutorial("001-ggplot2", "swc.tidyverse")
(helpers, please paste this into the chat at the right time.)
9a: Create a plot and alter the theme. Try the dark theme, for instance!
9b: Edit the theme and make the plot as ugly as you can! Use both the theme and scales for the colours to find the most horrible combinations!
## 9a
ggplot(penguins) +
geom_boxplot(
mapping = aes(y = body_mass_g,
x = factor(year),
fill = factor(year))
) +
theme_dark()
## Warning: Removed 2 rows containing non-finite values (stat_boxplot).
## 9b
ggplot(penguins)+
geom_density2d(
aes(x = body_mass_g,
y = bill_length_mm,
colour = island)
) +
scale_colour_brewer(palette = "Dark2")
## Warning: Removed 2 rows containing non-finite values (stat_density2d).
We conclude this lesson by reiterating our ggplot2 data visualization template.
`ggplot(data = <DATA>,
mapping = aes(<GLOBAL_MAPPINGS>)) +
<GEOM_FUNCTION>(
mapping = aes(<GEOM_MAPPINGS>)
) +
<SCALE_FUNCTION> +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION> +
<LABS>`
We learned about seven parameters of ggplot functions. However, it is very rare that all six of them need to specified in a given graphic or chart. Most of the time ggplot offers useful defaults for everything other than data, geoms and mappings.